Data Explorartion of Pilot Summer Program at UCR


Write up

A couple things to consider here.

1) We want to show as much data as possible to see where the data gaps are and what we can do so far

2) for the items as "canvasdummyname" we have an exported csv with the IP adressses to try to match that

3) Then we can decide how to edit the xAPI process

The Data Frame

The goal is get the data frame to to go from this JSON format...

    {
        "actor": {
            "name": "canvasdummyname",
            "mbox": "mailto:canvasdummyemail@gmail.com"
        },
        "verb": {
            "id": "http://adlnet.gov/expapi/verbs/answered",
            "display": {
                "en-US": "answered"
            }
        },
        "object": {
            "id": "https://elearn.ucr.edu/courses/3730",
            "definition": {
                "name": {
                    "en-US": " Week 1 Module 1: Moments"
                },
                "description": {
                    "en-US": "Student has answered slide 8.4"
                },
                "type": "http://id.tincanapi.com/activitytype/slide"
            },
            "objectType": "Activity"
        },
        "result": {
            "response": "50",
            "duration": "PT4S",
            "score": {
                "min": 0,
                "max": 6,
                "raw": 3,
                "scaled": 0.5
            },
            "success": false
        },
        "id": "c5a83fe7-1927-4696-ad4b-57bb322ed06a",
        "timestamp": "2021-06-18T07:29:40.802Z",
        "stored": "2021-06-18T07:29:40.802Z",
        "authority": {
            "objectType": "Agent",
            "account": {
                "homePage": "https://xcite-testing.lrs.io/keys/authorization",
                "name": "authorization"
            }
        }
    }

To something like this below instead of list of complex JSON statements.

df name (string) verb id (string) object id (string) object descption (string) result duration (string) result responce (string) score max (int) score raw (int) score scaled (double/float) sucesss (boolean) time stamp (string)
0 Nicole Garcia http://id.tincanapi.com/verb/viewed https://elearn.ucr.edu/courses/3730 Student has viewed video: Truss reaction Forces PT15S responce here 6 4 .66 false 2021-06-18T07:29:40.802Z

Observations

1) to decipher which statements came from the webpages or Storyline, looks like it records which access key was used. For right now we both used the same keys so we can use it.

2) df is 21,343 statements with 22 diffrent columns. Matches the LRS records.

3) data is too wide

4) verb id and verb desc are eseentially the same thing we only need one. There are other similar cases

5) based on the method to grab username via the canvas LMS, i'd be more confident using actor.name than actor.mbox

( i'm not sure if the canvas string chnages per session, though I also think it's possible to change your name through canvas too so that could be an issue for another time )

TODO: Parse ISO 8601 duration format into seconds to make it more sortable